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CONTP.NTS 


A  vivid  image  of  the  recent  evolution  of  computer  technology  is  that  of  a  race"  between 
function  and  usability.  New  technologies,  new  capabilities  become  available  to  users  faster  than 
user  problems  can  be  studied,  understood  and  addressed.  Tor  example,  the  many  user  studies  of 
word  processing  applications  carried  out  over  the  past  decade  focused  their  attention  on  keyboard 
oriented,  stand-alone  systems  with  small  and  low-resolution  monochrome  displays.  In  I9R1,  our 
group  at  the  Watson  Research  Center  turned  attention  to  secretaries  learning  to  use  such  word 
processing  applications.  At  the  time,  this  was  a  novel  application;  computer  editing  was  still 
largely  the  province  of  programmers  revising  code. 

But  now,  and  without  a  fini.shed  analysis  of  word  processing,  the  frontier  of  usability  has 
been  pressed  onward  by  the  development  and  introduction  of  new  applications  and  new  interface 
technologies.  Communication  applications  such  as  electronic  mail  and  computer  conference 
support  raise  usability  challenges  far  more  dis'ersc  than  those  raised  by  the  extension  of  word 
processing  to  nonprogrammers.  In  the  current  technology,  multiple  users  cooperatively  access 
multiple  applications  via  an  extremely  heterogeneous  collection  of  workstation  types.  And  even 
as  the  usability  issues  in  these  new  domains  arc  being  articulated  and  explored,  leading-edge 
prototypes  are  introducing  gestural  (e.g.,  handwriting)  and  speech  input  and  interactive  video 
output.  Such  new  developments  are  occurring  more  rapidly,  more  broadly  across  the  industry, 
and  impacting  more  users  all  the  time. 

The  race  between  function  and  usability  has  made  the  area  of  human-computer  interaction 
(or  MCI)  a  very  high-profile  research  area  within  computer  science  and  within  the  computer 
industry,  it  is  difficult  to  develop  usability  science  and  technology  fast  enough,  but  it  is  also 
critical  to  do  so.  Indeed,  the  race  has  created  the  need  for  chapters  like  this  one.  However,  this 
attention  has  also  helped  to  expose  some  fundamental  perplexity  about  what  the  field  is  and  how 
it  is  supposed  to  work,  ft  is  still  the  case  that  IICI  research  has  its  principal  effect  on  discmxions 
of  usability  and  user  interface  design  and  only  a  small,  derived  effect  on  actual  practice  in  the 
design  and  development  of  computer  systems  and  applications. 

What  is  the  goal  of  IICI  research?  There  need  not  be  a  single  answer  to  this  question.  But 
the  more  answers  there  are,  and  the  more  irreconcilable  the  various  answers  are,  the  more 
fragmented  the  field  will  appear.  In  IICI  there  arc  many  answers  to  this  question.  One 
traditional  answer  comes  from  the  field  of  Human  Factors:  HCl  needs  to  provide  methods  and 
metrics  for  evaluating  the  usability  of  computers.  A  second  answer  comes  from  Cognitive 
Science;  IICI  is  a  testbed  for  the  application  of  cognitive  psychology  to  a  real  problem  domain. 
A  third  answer  comes  from  the  exigencies  of  the  computing  industry;  IKH  must  help  guide  the 
definition,  invention  and  introduction  of  new  computing  tools  and  environments. 

The  practice  of  IICI  is  even  more  fragmented  than  its  goals  might  imply.  For  example, 
some  varieties  of  human  factors  evaluation  explicitly  suggest  that  developing  cognitive  science 
theories  of  IICI  may  impair  progress  in  understanding  usability  (Whiteside  and  Wixon,  1987).  On 
the  other  hand,  Newell  and  Card  (1985)  warn  that  psychology  might  be  driven  out  of  HCI  by 
computer  science  unless  it  can  develop  predictive  cognitive  models,  coining  the  .slogan  "hard 
science  drives  out  the  soft,"  Yet  even  the  most  developed  cognitive  models  in  IK.T  have  had  no 
significant  impact  on  the  design  of  user  interfaces  (Carroll  and  Campbell.  l^JSb),  Moreover,  it  is 
paradoxically  true  that  product  innovations  in  user  interface  design  have  generally  led  HCl 
research  rather  than  following  from  it  in  the  conventionally  assumed  flow  of  "technology  transfer" 
from  Research-to  Development.  The  recent  impact  of  the  Apple  Macintosh  illustrates  this. 

Perhaps  fhese  conflicting  and  fragmented  views  of  H(T  can  be  understood  as  consequences 
of  the  race  between  function  and  usability,  of  the  rapid  growth  in  needs,  activities  and 
expectations.  Perhaps  the  current  perplexity  about  IK'I  reflects  an  intermediate  state  in  a  true 
evolution  toward  more  effective  approaches  to  understanding  the  usability  of  computer  systems 
and  applications.  In  this  chapter  I  take  such  an  historical  view,  identifying  three  distinct 
paradigms,  or  orientations  to  IICI  research  and  application.  Initially,  HCI  work  focussed  on 
empirical  laboratory  evaluation  of  computer  systems  and  techniques.  Subsequently,  empirical 
studies  of  u.sability  were  organized  by  and  addressed  to  cognitive  theoretical  description  of  human 
behavior  and  experience.  CZurrently,  the  focus  of  IKH  work  is  shifting  toward  a  more  directive 
role  in  invention,  design  and  development.  The  progression  of  these  three  paradigms  comprises  a 
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case  study  of  a  field  discovering  what  it  is  about,  and  more  generally,  of  the  variety  of  roles 
available  in  the  psychology  of  technology. 

1.  Human  Factors  Evaluation 

The  traditional  role  of  psychologists  working  in  the  contest  of  computer  applications  and 
services  is  empirical  evaluation  of  iisaoility.  Ihc  original  research  arena  of  human-computer 
interaction  is  the  psychology  of  programming  and  the  professional  programmer  (Curtis,  1985; 
Shneiderman,  1980).  A  prototypical  example  of  this  paradigm  is  a  set  of  experiments  conducted 
by  Sheppard,  Curtis,  Millman  and  l.ove  (1979).  In  one  of  these,  participants  were  given  20 
minutes  to  reconstruct  from  memory  a  Fortran  program  of  26-57  lines  that  they  had  studied  for 
the  preceding  25  minutes.  Two  approaches  to  "structured"  program  organization  (linear  sequence, 
structured  selection  and  structured  iteration:  Dijkstra,  1972)  were  contrasted  with  a  "convoluted" 
organization  (including  backward  exits  from  DO  loops,  arithmetic  IPs,  and  unrestricted  GOT  Os). 
Reconstructive  memory  for  the  convoluted  program  organization  was  poorer  (i.e.,  error  rates  were 
higher)  than  for  either  of  the  structured  organizations  (though  only  in  one  case  was  the  difference 
statistically  significant). 

Such  early  work  in  the  human  factors  of  programming  was  important  in  demonstrating  the 
feasibility  of  empirical  assessment.  By  addressing  some  of  the  timely  issues  of  the  day,  it 
broadened  the  grounds  of  debate  in  software  technology  from  formal  analysis  and  sy.stem 
performance  to  include  usability  and  productivity  issues.  The  basic  paradigm  of  directly 
comparing  two  alternate  designs  in  a  u.sability  evaluation  is  still  the  standard  of  practice  in  much 
HCI  research  and  in  many  product  development  laboratories. 

/./  Direct  empirical  contrast 

fhe  development  of  empirical  methodologies  for  evaluation,  and  (he  exercise  of  these 
methodologies  in  the  context  of  software  and  system  design,  is  a  continuing  need  in  MCI.  Direct 
empirical  measurement  is  still  the  only  adequate  means  of  assessing  the  usability  of  software 
techniques  and  computing  artifacts  (Carroll  and  Ros.son,  1985;  (rurtis,  1980;  Gould  and  lewis, 
1985).  Establishing  the  importance  of  usability  to  the  .success  of  computing  systems  and 
techniques,  and  developing  and  promoting  empirical  methodologies  to  make  usability  evaluations 
have  been  major  foci  of  HCI  work. 

From  the  start,  HCI  evaluation  studies  were  strongly  influenced  by  research  practice  in 
experimental  psychology:  emphasis  was  placed  on  tightly  controlled  laboratory  approaches.  From 
an  historical  standpoint,  this  was  a  reasonable  move:  there  w'as  an  acute  lack  of  theory  and 
methodology  for  investigating  usability.  These  laboratory  studies  generally  took  the  form  of  direct 
contrasts;  computing  artifacts  or  techniques  were  directly  pitted  against  one  another  in  a  brief  but 
behavior-intensive  measurement  .session.  This  evaluation  work  produced  a  variety  of  findings, 
often  framed  as  guidelines  for  .software  development  practice  and  user  interface  design,  generally  of 
the  form  "A  is  better  than  B."  And  perhaps  even  more  importantly,  the  work  set  a  more 
objective  standard  for  usability  evaluations,  and  provided  a  systematic  basis  for  scrutinizing 
designers'  hopeful  intentions  and  trade  press  reviewers'  glib  comments 

However,  there  are  many  limitations  inherent  in  the  laboratory-based  direct  contrast 
methodolopes  -of  experimental  psychology.  These  limitations  became  clear  when  the 
methodologies  were  applied  in  the  complex  practical  contexts  of  IK'I  design.  (Controlled 
laboratory  studies  of  software  are  difficult  to  design  and  carry  out.  The  investigator  needs  to 
master  programming  languages  and  computer  applications  in  order  to  be  in  a  position  to  assess 
others  performance  and  to  interpret  their  experiences.  The  experimental  tasks  that  are  studied 
necessarily  require  skilled  human  participants  and  involve  learning  and  using  very  complex  tools. 
This  is  expensive  and  timc-con.suming  re.search.  Such  difficulties  just  don't  come  up  when  one 
takes  an  experimental  appioach  to  memorizing  nonsense  syllables,  the  stock-in-trade  of  traditional 
experimental  psychology,  or  to  making  timed  respon.ses  to  meaningful  but  simple  objects  like 
isolated  words,  its  more  modem  variant. 

In  experimental  psychology,  the  sheer  differences  in  recall  rate  or  response  times  may  be  all 
there  is  to  know  about  a  person's  performance  in  a  task:  the  situations  arc  relatively  simple. 
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Understandably  perhaps,  such  work  is  directed  at  collecting  straightforward  quantitative  indicators 
of  performance  like  task  times  and  error  rates,  and  formally  testing  these  for  statistical  significance 
of  direct  contrasts  (that  is,  computing  the  probability  that  obtained  score  differences  might  have 
occurred  by  chance).  IICI  situations,  however,  are  not  simple  at  all.  In  many  cases  it  may  be 
more  important  to  know  how  people  approach  a  task,  or  how  they  feel  about  their  performance, 
than  it  is  to  know  how  quickly  or  successfully  they  perform.  Nevertheless,  the  early  commitment 
of  HCn  evaluation  work  to  direct  contrast  studies  created  a  strong  bias  for  collecting  quantitative 
indicators  of  performance,  like  time  and  success  measures,  and  against  placing  primary,  or  even 
equal  emphasis  on  qualitative  data  (which  in  other  human  factors  contexts  have  often  played  a 
more  prominent  role;  Chapanis,  1959:  23-95). 

These  constraints  of  direct  contrast  laboratory  methods  took  a  toll  on  the  relevance  of  HCI 
evaluation  work.  The  difficulties  of  designing  and  conducting  controlled  experiments  in  complex 
circumstances,  inclined  investigators  to  make  use  of  scaled-down  tasks,  for  example,  memorization 
and  reconstruction  of  small  programs.  Ibe  focus  on  quantitative  differences  inclined  investigators 
to  focus  on  the  simplest  of  performance  measures.  This  undermined  the  fundamental  objectives 
of  human  factors  evaluation,  transforming  questions  about  complex  human  behavior  and 
experience  in  complex  computing  environments  into  simple  scores  of  performance  on  toy-scale 
tasks.  Such  work  could  not  answer  the  underlying  "why"  questions  that  motivated  human  factors 
evaluation  in  the  first  place;  it  could  not  provide  the  depth  of  understanding  necessary  to  help 
guide  the  design  of  new  software  techniques  and  applications. 

Yet  this  style  of  work  became  quite  pervasive,  l  edgard,  Whiteside,  Singer  and  Seymour 
(1980)  assessed  the  use  of  symbolic  notations  in  text  editor  commands  by  contrasting  a  command 
language  having  extremely  complicated  symbolic  conventions  with  one  almost  free  of  these. 
Murrel  (1983)  contrasted  message-based  and  window-based  communication  for  a  cooperative 
decision-making  task.  Holt,  Boehm-Davis  and  Schultz.  (1987)  contrasted  object-oriented  design 
with  more  standard  approaches.  But  exactly  what  is  it  about  symbolic  notations  that  is  bad? 
What  is  it  about  window-based  communication  and  object  oriented  design  that  is  good?  None  of 
these  projects  re,solved  the  over  general  evaluation  issue  it  posed.  And  none  collected  detailed 
enough  information  to  contribute  to  a  conceptual  understanding  of  the  issues  involved. 

Worst  of  all  perhaps,  these  simplifications  frequently  did  not  even  produce  the  statistically 
significant  differences  they  were  adopted  to  facilitate.  The  use  of  indentation  to  highlight  structure 
in  program  listings  seems  intuitively  like  a  good  idea.  It's  a  simple  factor  that  can  in  principle  be 
conveniently  removed  from  the  complications  of  the  real  programming  process  for  direct  contrast 
laboratory  study.  However,  l  ove  (1977),  Shneiderman  and  McKay  (1976)  and  Weissman  (1974) 
all  failed  to  find  significant  benefits  of  indentation.  Studies  of  variable  names  have  produced  a 
conflicting  fjotpourri  of  results;  sometimes  mnemonic  names  are  more  effective  than 
non-mnemonic  names  and  sometimes  not  (Shneiderman.  1980.  70-71).  The  daunting  possibility 
remains  that  it  was  because  of  the  trivial  tasks  that  were  studied  and  the  limited  types  of  data  that 
were  collected  and  analyzed  that  no  differential  benefits  were  found. 

Such  practical  problems  with  direct  contrasts  encouraged  experimental  designs  contrasting 
extreme  positions,  again  to  increase  the  po.ssibility  of  measuring  statistically  significant  differences, 
[..edgard  et  al.'s  (1980)  assessment  of  symbolic  conventions  contrasted  extremely  complicated 
examples  of  such  conventions  with  an  extreme  absence  of  them,  l  .icbelt,  McDonald,  Stone  and 
Karat  (1982)  flowed  that  a  menu  system  was  easier  to  learn  when  the  menu  hierarchy  was 
organiz-ed  than  when  it  was  disorganized  (!).  Indeed,  in  the  Sheppard  et  al.  (1979)  experiment, 
several  alternate  approaches  to  "structured"  programming  were  consistently  indistinguishable  based 
on  the  data,  however  the  extreme  alternative  of  "convoluted"  programming  produced  significantly 
poorer  perfe  mance  than  one  of  the  structured  approaches.  In  a  sense,  this  study  did  not  so 
much  verify  the  benefits  of  deliberately  structuring  code  as  it  did  the  risks  of  deliberately 
mis-structuring  it.  (Obvious  and  extreme  evaluation  contrasts  arc  still  sometimes  professionally 
encouraged  as  long  as  they  employ  "an  interesting  methodology,"  Green,  1987.  6.) 

Finally,  human  factors  evaluation  work  is  highly  constrained  by  the  often  prodigious 
amounts  of  time  required  to  make  direct  experimental  contrasts  of  alternatives.  Indeed,  it  seems 
logically  doomed  to  consume  more  time  than  the  evolution  of  software  it  is  intended  to  guide. 
By  the  time  the  Sheppard  et  al.  (1979)  paper  appeared,  structured  programming  methods  were 
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already  the  established  practice.  The  e^'aluation  work  confirmed  what  had  already  happened, 
rather  than  playing  a  causal  role  in  the  evolution  of  practice.  I  hi.s  limitation  of  the  evaluation 
paradigm  for  HCI  could  be  called  the  "evaluation  dilemma":  one  cannot  evaluate  something  that 
does  not  yet  exist,  hence  direct  evaluation  always  lags  dc\elopment  by  some  fraction  of  a 
development  cycle  (Carroll,  1987a). 

In  sum,  the  exigencies  of  direct  contrast  laboratory  work  entrained  compromises  in  the  face 
validity  of  the  work  itself,  and  in  the  end,  oflen  failed  to  produce  definitive  or  timely  evaluations. 
How  should  programs  be  structured?  How  should  hypertextual  information  systems  be 
navigated?  One  cannot  answer  these  questions  with  a  few  simple  performance  measures,  but  they 
are  surely  empirical  questions.  Answering  them  would  involve  developing  a  detailed 
understanding  of  what  people  do  and  try  to  do  with  programs  and  applications  and  the  rich 
interaction  of  these  goals  and  actions  with  the  constructs  of  programming  languages,  the  facilities 
of  computing  environments,  aspects  of  the  workplace,  and  many  other  factors. 

These  complexities  have  had  a  predictable  effect;  even  in  quarters  where  human  factors 
evaluation  is  the  official  operating  paradigm,  mo.st  of  the  impact  of  psychology  on  the 
development  of  technology  has  come  about  through  task  analysis  or  consulting.  Indeed,  to  a 
considerable  extent  human  factors  evaluation  has  become  an  historical  stage  in  the  development 
of  current  HCI.  We  return  to  the  curious  schi.sm  between  what  is  officially  anointed  as  standard 
practice  and  what  is  in  fact  the  standard  practice  in  later  discussion  of  the  invention  paradigm  for 
HCI. 

1.2  Lack  of  theory 

The  guiding  hope  in  doing  evaluation  work  is  that  the  data  collected  and  the  methods 
developed  can  cumulate  into  coherent  analyses  about  why  some  systems  and  techniques  are  more 
u.sable  than  others,  and  about  how  to  enhance  the  u.sability  of  future  systems  and  techniques.  It's 
a  bottom-up  approach  to  developing  theory.  However,  directly  contrasting  two  complex 
situations  (e.g,,  two  versions  of  a  system)  to  determine  which  one  is  better  is  a  poor  vehicle  for 
sorting  out  and  saving  experience.  Complex  alternatives  with  no  a  priori  theoretical  analysis  do 
not  become  inteipretable  merely  in  virtue  of  a  simple  horse  race.  It  would  take  an  infinity  of  such 
"one-ofT  contrasts  to  build  a  theory  from  the  bottom  up.  F.vcn  the  simple  and  controlled 
situations  studied  in  experimental  psychology  would  be  intractably  indeterminate  without 
top-down  theoretical  direction. 

Many  of  the  difficulties  with  direct  contra.st  evaluations  can  be  attributed  to  this  lack  of 
theory.  The  use  of  toy-scale  problem  domains  and  simple,  quantitative  measures  is  problematic 
in  that  without  a  theory  of  HCI  domains  there  is  no  way  to  know  whether  a  toy  problem  is 
representative  of  a  real  problem  or  not.  There  is  no  way  to  know  whether  one  is  studying  a 
coherent  part  of  the  real  problem,  or  an  accidental  and  idiosyncratic  case.  Can  an  analysis  of 
writing  50-line  programs  be  scaled  up  to  the  problem  of  writing  5,nO0-line  programs?  Is  the  task 
of  pointing  a  cursor  at  an  arbitrary  screen  location  a  coherent  part  of  the  task  of  pointing  a  cursor 
in  the  course  of  editing  text?  Are  interpretations  of  isolated  system  events  related  to 
interpretations  of  the  very  same  events  embedded  in  a  real  stream  of  user  interaction?  Answering 
.such  questions  is  impossible  without  a  theory  with  which  to  interpret  the  toy  situations  and  to 
extrapolate  from  them  to  real  situations. 

Sheil  (1981),  for  example,  noted  that  complexity  is  not  linear  with  program  length.  It 
certainly  seems  that  the  task  of  editing  a  5,000-line  program  raises  problems  of  navigation  and 
naming  conventions  that  are  just  not  raised  in  the  task  of  editing  a  50-line  program.  Elements  of 
HCI  situations  may  interact  and  trade  off  in  different  ways  as  the  problem  scale  or  the  task 
changes.  Is  avoiding  GOTO  statements  more  or  less  important  that  employing  indentation  in  a 
program  listing?  And  are  there  contexts  in  which  the  relation  is  inverted?  Again,  without  a 
theory  there  is  no  way  to  extrapolate  these  interactions.  Indeed,  one  can  do  little  more  than 
organize  separate  studies  on  the  basis  of  superficial  features  te  g.,  as  pertaining  to  variable  names 
or  menu  systems).  Without  a  theory  of,  for  example,  how  people  understand,  name,  and 
remember  entities,  there  is  no  way  to  work  back  from  a  variety  of  performance  differences 
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obtained  in  a  variety  of  experimental  settings  to  an  explanation  of  the  underlying  concepts  that 
caused  the  differences  (see  Newell,  1973). 

In  the  absence  of  a  theoretical  framework  for  understanding  usability,  MCI  evaluation  work 
has  had  to  address  issues  at  a  very  large  grain  of  analysis.  Hauptmann  and  Green  (1983),  for 
example,  contrasted  a  natural  language  interface  with  a  menu  interface  for  creating  business 
graphics  (failing  to  find  any  significant  differences  in  time,  errors  or  attitudes).  Of  course, 
contra.sting  natural  language  with  menus  is  painting  with  a  rather  broad  stroke,  how  could  a  single 
experimental  contrast  resolve  such  a  multifaceted  contrast?  Were  the  two  interfaces  individually 
optimized  to  be  the  best  interface  possible  in  their  respective  interface  styles?  Were  they 
controlled  to  have  the  same  functional  capabilities  and  the  same  task-relative  functional 
capabilities?  The  same  kinds  of  questions  arise  for  the  examples  discussed  earlier,  evaluating 
structured  programming,  object  oriented  programming  and  symbolic  notations.  The  lack  of 
theory  forces  these  crude  contrasts;  but  the  crude  contrasts  prohibit  pertinent  or  univocal  results. 

Methods  and  theories  in  software  technology  arc  often  collections  of  loosely  connected 
pre.scriptions.  Ideas  like  structured  programming  and  direct  manipulation  (Shneiderman,  1983) 
are  important  theoretical  concepts,  and  they  surely  carry  empirical  consequences.  Rut  they  are 
not  falsifiable  in  the  Popperian  sense  (Popper,  1965):  one  cannot  hope  to  reject  such  ideas  lout 
rnurt  on  the  basis  of  i.solated  laboratory  tests,  to  try  to  do  so  is  to  get  the  logic  of  the  inquiry 
wrong.  From  our  current  perspective  of  a  few  years  hence,  it  is  clear  that  no  outcome  of  the 
Sheppard  et  al.  (1979)  study  could  have  rejected  structured  programming  as  an  appropriate 
prescriptive  theory.  The  real  evaluation  need  is  for  detailed  qualitative  information  that  can  guide 
the  revision  and  integration  of  such  ideas.  The  issue  is  not  whether  structured  programming  is 
good,  or  indeed  whether  it  is  better  than  some  other  approach;  the  issue  is  what  structured 
programming  really  consists  in,  how  in  detail  it  impacts  actual  programming  tasks,  and  how  it  can 
be  integrated  into  routine  programming  practice. 

The  a.ssessment  goal  is  just  too  limiting:  a  paradigm  that  merely  evaluates  distinctions 
articulated  by  others,  deprives  itself  of  playing  any  directive  role  (Sheil,  1981).  In  this  context,  we 
can  understand  why  studies  like  Sheppard  et  al.  failed  to  lead  to  the  development  of  an  articulated 
theory  of  programming:  the  evaluation  enterprise  bound  itself  to  what  already  existed, 
commenting  at  a  high  level  on  the  appropriateness  of  specific  techniques  from  the  mid-1970s.  A 
poignant  example  is  the  work  showing  that  input  error  rates  are  reduced  when  using  teletype 
terminals  instead  of  visual  display  units  (Walther  and  O'Neil,  1974;  Carlisle,  1970).  It  was  never  a 
possibility  that  teletype  terminals  would  .supplant  visual  display  units  through  the  course  of 
technological  evolution,  quite  the  contrary.  I'he  bald  evaluation  result,  without  specific 
implications  for  the  design  of  future  visual  display  devices,  can  only  be  seen  as  an  historical 
curiosity 

Fmpirical  evaluation  of  .software  and  systems  is  a  key  to  u.sability.  But  it  is  a  sep2u:ate 
question  whether  a  science  of  human-computer  interaction  can  arise  out  of  this  activity.  In  fact,  it 
did  not.  The  evaluation  paradigm  introduced  psychology  and  psychologists  to  the  HCI  problem 
domain.  It  was  a  platform  for  establishing  the  importance  of  usability  and  for  developing 
empirical  approaches  to  measuring  the  u.sability  of  systems  and  software.  However,  its 
methodological  commitments  and  lack  of  theory  cast  it  in  a  supporting  role  in  emerging  software 
and  user  interface  science;  more  of  a  commentator  on  new  technology  than  a  directive  force. 
The  challenge  4hat  this  raised  was  how  psychology  could  play  a  more  directive  role  in  the 
development  of  new  software  and  user  interface  technology. 

2.  Cognitive  Description 

In  the  early  1980s  there  was  a  shift  toward  bringing  HCI  research  under  the  aegis  of  broader 
psychological  theory.  Shneiderman  (1980:  51),  for  example,  used  Miller's  (1956)  classic  paper  on 
human  information  processing  limitations  to  derive  the  prescription  that  programmers  avoid  the 
use  of  GOTO  con.structs.  Shneiderman  analyzed  the  process  of  understanding  programs  as 
involving  the  recoding  of  lines  of  code  into  meaningful  "chunks".  GOTO  jumps  in  a  program 
text  disrupt  this  structure  by  functionally  chunking  nonadjacent  lines  of  code.  In  1983,  Card, 
Moran  and  Newell  (1983)  published  a  compelling  monograph  adapting  information  processing 
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psychology  to  the  description  of  fluent  user  interaction  uith  text  editors,  1  i.ese  cfTorls  had  an 
enormous  effect,  enlarging  and  intensifying  interest  in  the  psychology  of  usability  both  within 
computer  science  and  within  psychology. 

This  shift  confronted  one  of  the  key  limitations  of  earlier  work,  the  lack  of  theory.  Tying 
specific  empirical  results  to  theories  of  human  information  processing  provided  means  to  integrate 
diverse  results,  to  resolve  nonsignificant  or  conflicting  findings,  to  dampen  the  distortions  of  poor 
research,  but  most  importantly  to  develop  abstractions  that,  in  principle,  could  help  lead  the 
development  of  software  technology  and  user  interface  design. 

However,  this  work  also  raised  new  issues  and  problems.  Aligning  HCl  phenomena  with 
cognitive  descriptions  of  those  phenomena  is  useful  to  the  extent  that  the  cognitive  descriptions 
themselves  are  rich,  revealing  and  well-integrated.  In  fact,  psychological  theory  is  at  least  as 
fragmented  as  software  theory  and  methodology.  Building  a  psychologv’  of  usability  by  placing 
this  body  of  fragmented  theory  into  correspondence  vvith  software  situations  risks  inheriting  the 
fissures  as  well  as  the  solid  ground.  Ironically,  cognitive  description  work  also  threatened  the 
major  achievement  of  human  factors  evaluation,  namely,  establishing  the  centrality  of  direct 
usability  testing  to  the  ultimate  success  of  computing  systems  and  techniques.  The  cognitive 
description  paradigm  entrained  a  strongly  analytic  conception  of  software  design,  raising  the 
question  of  how  much  direct  evaluation  might  be  necessary  if  a  good  theory  were  in  hand, 

2.1  Breadth  versus  depth 

Scientific  psychology  seeks  to  understand  behavior  and  experience  by  providing  laws, 
concepts,  and  explanations.  Hovvcver,  there  are  severe  limits  on  what  types  of  phenomena 
psychology  can  address  with  the.se  goals  and  tools;  there  are  ranges  over  which  the  goals  and  tools 
make  sense  and  outside  of  which  they  do  not.  In  particular,  academic  psychology  typically 
attempts  to  capture  generalizations  across  domains.  But  fine  details  of  specific  task  situations  can 
be  very  important:  what  a  person  thinks  and  decides  to  do  is  often  ascribable  to  knowledge  of  a 
single  fact,  e  g.,  the  name  of  a  particular  command  in  a  particular  system.  These  fine-grained 
details  .serve  as  boundary  markers  for  theorizing:  scientific  laws  that  must  refer  to  individual  facts 
as  conditions  seem  unwieldy,  and  p.sychologists  routinely  make  a  strategic  retreat  to  abstract  or 
artificial  domains  to  control  such  details. 

This  is  a  reasonable  heuristic,  with  extensive  precedent  in  the  sciences.  Classical  mass  point 
mechanics  is  developed  under  the  idealization  of  frictionlcss  contact,  even  though  there  are  no 
frictionless  systems.  Other  theoretical  apparatus  has  been  developed  to  add  back  the  effects  of 
friction  in  real  systems.  The  difficult  details  of  friction  arc  treated  as  “perturbations"  of  the 
classical  theory  (Gleick,  l'J87),  Similarly,  the  traditional  research  strategy  in  psychology  has  been 
to  focus  on  sweepingly  general  i.ssues  and  distinctions  under  the  idealization  that  domain  and 
situation  context  can  be  ignored.  Basic  p.sychological  research  addresses  topics  like  the  "structure 
of  memory,"  but  not,  for  example,  "memory  for  llnix  commands"  (Norman,  1981).  It  tries  to 
generally  resolve  "big"  issues  like  is  there  a  .separate  mental  type  for  imagery?"  (Pylyshyn,  197.1; 
Paivio,  1971). 

It  turns  out  that  describing  frictionless  contact  provides  a  useful  foundation  for 
understanding  the  motion  of  real  objects  in  real  circumstances  Fven  though  the  effects  of  friction 
are  not  simple,  treating  these  effects  as  perturbations  of  an  idealized  theory  has  also  proven 
tractable  in  en^ncering  applications  (for  example,  computing  trajectories).  The  question  is 
whether  the  same  basic  strategy  is  useful  in  psychology  This  i.s  an  open  quc'-tion  Newel!  (1973), 
for  example,  criticized  the  pursuit  of  sweeping  dichotomies  like  existence  of  a  separate  mental  type 
for  imagery,  saying  "you  can't  play  twenty  questions  with  nature  and  win."  Indeed,  the  emergence 
in  the  1980s  of  Cognitive  Science  as  a  broader  discipline,  incorporating  p.sychology  with  the 
serious  consideration  of  the  structure  of  task  domains,  can  be  seen  as  a  respon.se  to  traditional 
idealizations  (Carroll,  1988a). 

Chase  and  Simon  s  (1973)  classic  study  of  expertise  in  chess  showed  that,  for  a 
reconstructive  memory  task,  chess  masters  tended  to  recall  piece  positions  in  attack  and  defense 
groupings.  This  study  has  had  two  very  different  legacies.  On  the  one  hand,  it  opened  up  a 
variety  of  questions  about  domains.  I  low  are  chess  piece  groupings  indexed  in  a  player's 


F.valuation,  Dc^riplion  and  Invention 


6 


memory;  how  they  are  accessed  in  realistic  tasks  (like  playing  chess,  as  opposed  to  reconstructive 
memory  for  arbitrary  board  positions),  how  does  expertise  in  chess  develop  through  significant 
spans  of  time?  Many  of  these  issues  have  been  pursued  and  in  a  variety  of  domains  (sec  C'hi, 
Glaser  and  Farr,  1988),  though  many  would  argue  that  the  work  still  fakes  too  narrow  a  view  of 
the  process  of  attaining  expertise  and  of  the  nature  of  expert  knowledge  and  performance  (e  g., 
Dreyfus  and  Dreyfus,  1986). 

On  the  other  hand.  Chase  and  Simon's  result  was  swccpingly  generalized  as  "experts  have 
chunks,"  and  has  been  mechanically  replicated  in  domain  after  domain,  flicre  is  no  rich  and 
well-integrated  theory  of  either  experts  or  chunks  outside  of  considerations  of  specific  domains. 
Thus,  these  studies  show  only  that  when  humans  know  something  about  a  domain  and  are  asked 
to  do  reconstructive  memory  tasks  of  an  arbitrary  sort,  they  use  what  they  know  to  do  the  task. 
A  series  of  these  studies  have  been  undertaken  in  IK'l  contrasting  memory  performance  for 
scrambled  and  unscrambled  program  listings  (Adclson,  1981,  McKeithen,  Reitman,  Reuter,  and 
Mirtle,  1981,  Shneiderman,  1980).  This  work  showed  that  people  with  programming  experience 
can  use  knowledge  of  language  structures  in  organizing  their  memories. 

This  fmding  has  not  led  to  rich  understandings  of  how  jseople  achieve  expertise  in 
programming  or  about  how  programming  knowledge  is  indexed  in  memory  and  accessed  in 
performance.  It  has  not  helped  to  guide  the  development  of  new  software  tools  and 
environments.  These  cognitive  descriptions  do  not  address  and  provide  no  guidance  in  practical 
a.spccts  of  programming  (the  design  of  programming  languages,  environments,  education,  etc  ); 
they  do  not  even  engage  issues  specific  to  the  domain  of  programming  (the  types  of  modules  one 
would  want  in  a  library  to  facilitate  code  reusability). 

An  extensive  tradition  of  psychological  research  describes  learning,  memory  and  error 
patterns  for  paired-associates,  the  classic  nonsense  syllabic  (e  g  ,  Fsper,  1925;  Postman  and  Stark, 
1962).  This  w'ork  has  been  applied  to  the  analysis  of  user  performance  with  various  types  of 
command  languages  (Barnard,  Hammond,  Morton,  long  and  Clark,  1981;  Carroll,  1982; 
I.andauer,  Galotti  and  Hartwell,  198.t).  For  the  most  part,  these  applications  have  been  no  less 
mechanical  than  those  of  the  "experts  have  chunks"  work.  Yet  they  have  been  relatively  more 
successful  in  that  the  cognitive  descriptions  developed  for  command  language  interactions  have 
had  fairly  specific  prescriptive  content  for  command  language  design.  Indeed,  HCI  research  on 
command  names  has  led  to  .specific  revisions  in  philosophical  and  linguistic  conceptions  about 
what  names  are  (Carroll.  1985). 

But  this  work,  and  indeed  all  cognitive  description  work  in  HCI,  is  subject  to  a  very 
fundamental  problem  in  the  underlying  logic  of  the  inquiry  Psychology  concerns  itself  with 
existence:  is  there  a  separate  mental  type  for  imagery?  HCI.  like  any  applied  science  domain, 
concerns  itself  with  imparl:  how  much  of  a  difference  will  certain  types  of  consistency  make  in  the 
leamability  of  a  command  language?  This  is  why  the  "experts  have  chunks'  work  seems 
reasonable  from  the  perspective  of  our  curiosity  about  chess  masters  and  other  experts,  but 
difficult  to  apply  in  the  face  of  questions  about  how  to  support  experts  and  facilitate  the 
development  of  expertise.  This  is  also  why  the  u.sc  of  extreme  contrasts,  like  scrambled  programs 
versus  structured  programs,  can  make  sense  in  the  pursuit  of  basic  theory,  but  much  less  so  in  the 
pursuit  of  meaningful  application. 

landauer  (1987a)  has  recently  called  attention  to  this  in  observing  that  while  basic 
psychology  routinely  focusses  on  the  "significance"  of  effects,  it  typically  disregards  the  size  of 
effects.  Cognitive  descriptions  framed  in  terms  of  existence  dichotomies  can  be  assessed  by  the 
statistical  significance  of  direct  contrasts;  do  expert  programmers  chunk  more  than  novices? 
However,  such  differences  do  not  guarantee  that  the  effects  will  be  large  enough  to  matter. 
Would  it  matter  if  experts  reliably  chunked  2  percent  more  than  novices?  Would  it  matter  if 
scrupulously  consistent  command  languages  were  learned  3  percent  faster  than  randomly 
consistent  languages?  To  determine  the  practical  size  of  effects  one  needs  to  consider  cost-benefit 
tradeoffs  in  realistic  task'  Chunking  may  have  a  big  effect  on  people  trying  to  memorize 
scrambled  little  programs,  but  the  size  of  effect  question  forces  attention  to  real  programmers 
writing  and  reading  real  programs.  The  two  situations  might  be  quite  different. 
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2.2  Design  by  deduction 

IICI  is  fundamentally  a  design  domain;  it  exists  in  the  first  place  because  of  the  need  to 
design  more  usable  computing  artifacts  for  people  to  use  Design  in  a  complex  and  poorly 
charted  domain  can  seem  like  trial  and  error.  Mow  should  user  interface  design  work  proceed  to 
ensure  more  usable  user  interfaces?  I'hc  human  factors  esaluation  paradigm  sought  to  address 
this  kind  of  question  by  providing  methodology  for  directly  evaluating  design  techniques  (like 
structured  programming)  and  particular  artifacts  (for  example,  a  particular  programming  language 
or  prograniming  environment).  But  direct  evaluation  operates  on  a  case  by  case  basis.  The 
cognitiv*  aescription  paradigm  sought  to  improve  upon  this  by  providing  theoretical  abstractions 
bey  .d  the  specific  cases  (see  Moran,  l‘)8l). 

C'ard,  Moran  and  Newell  (1983)  made  what  is  surety  the  most  thorough  and  disciplined 
attempt  to  interpret  and  develop  modem  information  processing  psychology  into  a  foundation  for 
the  design  of  computer  systems.  In  their  (JOMS  model  (an  acronym  for  Goals,  Operators, 
Methods  and  Selection  rules),  users  hierarchically  decompose  their  goals  into  successively  finer 
subgoals  until  these  match  a  basic  set  of  methods.  I’hc  user  has  rules  for  selecting  methods 
appropriate  to  the  current  situation,  and  each  method  itself  consists  of  a  sequence  of  operators, 
keypresses  and  hand  motions.  This  analysis  was  fitted  to  a  variety  of  text  editing  performance 
data,  in  many  cases  yielding  consistent  values  for  the  model's  parameters. 

However,  the  theory  proved  quite  limited  in  application  to  user  interface  design.  GOMS 
was  not  able  to  describe  problem-solving  activity,  only  routine,  over-practiced  performance.  In 
fact,  it  could  not  describe  errors  at  all,  even  though  nearly  a  third  of  the  routine  behavior  it  sought 
to  describe  consisted  of  error  and  error  recovery.  It  wax  also  severely  hampered  by  the  race 
between  function  and  u.sability:  By  the  time  it  had  produced  good  pierformancc  descriptions  for 
error-free,  over-practiced  behavior  on  line-oriented  editors,  the  focus  of  concern  in  user  interfaces 
and  end-user  applications  had  moved  on  to  other  problem  areas.  (See  Carroll  and  Campbell, 
1986,  for  further  discussion.)  The  work  had  its  greatest  impact  on  relatively  low-level  aspects  of 
human-computer  interaction.  like  the  analysis  of  pointing  devices  (C'ard,  Fnglish  and  Burr,  1978). 
Indeed,  it  appears  that  this  approach  may  only  work  for  user  interaction  events  on  the  order  of 
one  second  in  duration  in  which  errors  are  extremely  rare  and/or  extremely  regular  (!),  and  for 
technological  contexts  that  arc  unchanging  on  the  order  of  decades  (Newell  and  ('ard,  1985).  Few 
design  problems  in  IICI  fall  into  this  rather  severe  category. 

Most  cognitive  description  work  is  far  less  theoretically  ambitious  than  the  GOMS  work. 
For  example,  the  use  of  menu  selection,  as  an  altcmnlivc  to  typed  commands  is  sometimes 
"deduced "  from  the  fact  that  humans  are  better  at  recognition  than  at  recall  (e  g.,  Tennant,  Ross 
and  Thompson,  1983).  This  is  terribly  oversimplified  Csers  of  menu  systems  must  deal  with 
formidable  navigation  problems  (MacGregor  and  lee,  1987;  Roberf.son,  McCracken,  and  Newell, 
|981).  ’They  mu.st  deal  with  complex  morphological,  semantic  and  referential  relations  between 
various  selection  names  (Carroll,  1985).  Here  again,  the  evolution  of  user  interface  technology  is 
complicating  the  simple  dichotomies:  rich  .aliasing  ((lomcx  and  I  ochbaum,  1985)  may 
substantially  mitigate  the  relative  difficulty  of  recall  and  alternative  appro.achcs  to  menu  design 
may  carry  differing  performance  implications  (pop-up  menus,  multiple  selection  menus,  active 
forms).  Finally,  though  the  advantage  of  recognition  over  recall  is  an  established  sweeping 
principle  in  psychology  (e  g.,  Crowder,  1976),  Black  and  Scbrcchts  (1981)  have  observed  that 
there  are  circurflstances  in  which  the  reverse  is  true. 

We  earlier  considered  Shneiderman  s  (1980)  reference  to  Miller's  (1956)  analysis  of  human 
information  processing  limitations  in  grounding  the  prescription  to  avoid  GGIOs.  Miller  s 
specific  argument,  however,  does  not  consider  spatial  or  temporal  proximity  of  items  to  be 
"chunked."  Accordingly,  the  GOTO  prescription  cannot  be  deduced  from  Millers  analysis. 
Indeed,  virtually  nothing  of  much  interest  could  he  deduced  from  the  specifics  of  Miller  s  analysis. 
The  connection  is  more  informal:  Miller's  work  called  attention  to  the  (obvious)  fact  that 
humans  are  limited  with  respect  to  the  information  they  can  manage;  Shneiderman  was  inspired 
by  this  to  suggest  a  particular  tactic  for  easing  information  management  in  programming.  Fhe 
informality  of  the  theoretical  linkages  is  not  specially  problematic:  the  non -psychological 
theory-components  of  IK’I  do  no  better  (e  g.,  what  is  an  interface  toolkit?).  Having  theories 
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cogent  enough  and  pertinent  enough  to  even  informally  direct  and  inspire  design  work  is  a  big 
advantage. 

The  problem  vis-a-vis  design  by  deduction  is  that  in  none  of  these  examples  of  cognitive 
description  applied  to  design  do  we  have  in  hand  the  ancillary  theoretical  apparatus  to  deductively 
brid^  between  the  'leading  claims"  and  the  implementation  details.  GOMS  is  probably  a 
reasonable  first  approximation  framework  for  thinking  abriut  task  analysis.  Recognition  probably 
is  easier  than  recall  in  many  circumstances.  GO  lOs  probably  do  strain  human  information 
processing  capacity  But  to  use  this  theoretical  material  deductively  in  design  we  need  to  know 
precisely  how  the  details  of  given  situations  interact  with  and  modulate  the  psychological 
principles.  None  of  the  theories  is  complete  enough  to  tell  us  this.  Hence  none  can  be  used 
deductively. 

To  an  extent,  this  lack  can  be  addressed  through  theory'  development.  I'or  example,  Poison 
(1987)  has  developed  the  GGMS  approach  into  a  potentially  more  useful  design  tool  However, 
other  considerations  indicate  that  HCI  design  can  never  be  rendered  deductive.  I'he  particular 
complexity  of  software  technology  stems  from  the  fact  that  everything  inherently  interacts  with 
everything  else  (Brooks,  1987).  The  technolopcal  context  plays  an  important  role  in  determining 
whether  an  idea  will  surs'ivc  at  all.  For  example,  object  oriented  techniques  have  been  seen  as  a 
major  advance  in  software  technology,  hut  the  successful  use  of  these  techniques  is  limited  by  the 
availability  of  appropriately  supportive  programming  environments  (Hehbing,  1987).  Many  times 
these  interactions  cannot  be  anticipated  at  all.  Presenting  rich  information  di.splays  and  direct 
access  to  running  code  often  entrains  cluttered  displays  and  inefficient  performance.  Many  of 
these  critical  details  and  interactions  cannot  be  analyzed  before  a  prototyr>e  system  is  built. 
Indeed,  one  of  the  most  important  determinants  of  the  success  of  software  technologies  is  their 
amenability  to  revision  and  reimplementation  on  hardware  and  software  platforms  not  even 
available  when  they  were  first  developed  (Brooks.  1987). 

The  cognitive  description  paradigm  in  HCI  was  a  genuine  advance  It  provided  independent 
conceptual  foundations  for  the  p.sychology  of  HCI  that  made  it  possible  to  develop  useful  theory. 
Reciprocally,  it  brought  the  IKH  domain  within  the  purview  of  academic  psychologists  This  has 
opened  a  two-way  dialog  within  which  basic  cognitive  psychology  may  stand  to  gain  as  much 
from  the  cognitive  engineering  case  study  of  HCf  as  HCI  may  stand  to  gain  from  the  science  of 
cognition  (Carroll,  1987b;  Norman,  1987), 

3.  Usability-Innervated  Invention 

I'he  human  factors  evaluation  and  cognitive  description  paradigms  share  basic  assumptions 
about  the  f>osition  of  psychological  analysis  in  HCI.  T  hey  assume  that  psychology  operates 
outside  the  development  proce.ss,  outside  even  the  research  prototyping  process.  They  assume 
that  the  role  of  psychologists  in  HCI  is  to  offer  commentary:  evaluations,  theoretical  descriptions, 
i)ut  not  direct  participation  in  the  invention,  design  and  development  of  new  HCI  technologies 
and  artifacts.  This  assumed  positioning  and  role  for  psychology  in  IKT  is  all  the  more  striking 
when  one  recognizes  that  HCI  is  fundamentally  a  design  domain.  HCI  is  a/rout  designing  new 
software  tools  and  user  interfaces.  Seen  in  this  light,  the  traditional  paradigms  for  psychology  in 
HCI  have  pursued  a  tangential,  supporting  role  in  the  field's  key  endeavor  and  raison  d'etre. 

It  has,  of  course,  been  recognized  that  serious  usability  research  needs  to  pay  serious 
attention  to  the  nature  of  HCI  domains  and  tasks,  fhis  concern  has  always  been  in  the  focus  of 
HCI  work.  But  being  relevant  to  designer  needs  is  not  the  same  as  taking  the  initiative  in  the 
design  work  itself  The  implicit  division  of  labor  in  HCI  has  had  chronic  organizational 
consequences.  For  example,  a  recent  panel  discussion  at  the  ACM  Cl  II '88  Conference  asked 
how  human  factors  specialists,  and  cognitive  scientists  working  on  usability,  can  organize  to 
effectively  work  with  designers  and  developers  (Grudin,  1988).  The  answers  offered  arc  revealing: 
human  factors  professionals  should  be  placed  directly  into  development  groups,  human  factors 
professionals  should  manage  the  developers,  usability  consultants  from  outside  the  organization 
should  be  used  (!).  The  traditional  paradigms  created  an  organizationally  adversarial  basis  for  the 
exchange  of  commentary  between  .software  developers  and  psychologists. 
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The  traditionally  assumed  positioning  and  role  of  psychology  within  IKT  is  now  being 
seriously  questioned.  In  this  new  paradigm  of  "usability-innervated  invention, "  usability  is  seen  as 
connecting  the  invention  of  IK'I  artifacts  to  user  needs  no  less  essentially  than  nerves  connect 
organs  and  muscle  tissues  to  sensory  and  motor  brain  centers.  The  activity  of  muscles  and  organs 
is  meaningful  only  insofar  as  it  is  innervated  by  sensation  and  action;  the  activity  of  inventing 
HCI  artifacts  is  meaningful  only  insofar  as  it  is  innervated  by  u.sability  considerations. 
Conversely,  .sensory  and  motor  centers  exist  at  all  to  innervate  the  body's  muscle  and  organs; 
understanding  usability  is  important  because  it  produces  the  critical  direction  for  HCI  invention. 
In  this  view,  HCI  artifacts  arc  not  merely  evaluated  or  described  in  terms  of  their  usability;  they 
are  conceived  and  created  for  usability. 

3.1  Psychology  as  a  mother  of  invention 

Building  and  inventing  things  is  not  a  traditional  activity  in  psychological  research. 
Psychology  is  part  natural  science  and  part  social  science;  its  traditional  focus  is  the  analysis  of 
natural  and  .social  phenomena.  In  the  technological  arena  of  HCI,  this  traditional  focus  was 
straightforwardly  extended  to  the  analysis  of  technology  through  evaluation  and  theoretical 
description.  But  these  traditional  activities  also  provided  the  opportunity  for  p.sychologists 
working  in  HCI  domains  to  develop  technological  .skills  and  domain  experience.  In  many  cases, 
these  psychologists  are  now  in  a  position  not  only  to  analyze  usability  problems,  but  to  synthesize 
technolopcal  solutions.  In  his  plenary  address  at  the  CHI  +  Gl'S?  Conference,  Tom  l^ndauer 
(1987b)  succinctly  captured  this  in  casting  "psychology  as  a  mother  of  invention"  in  HCI. 

Many  recent  prototype  systems  and  interface  techniques  were  invented  by  psychologists  to 
instantiate  specific  psychoK;pcal  claims  and  to  allow  these  claims  to  be  explored  and  developed 
empirically.  For  example,  landaucr's  group  analyzed  human  performance  in  a  variety  of  naming 
and  reference  tasks  to  develop  specific  tools  and  techniques  for  keyword  information  systems  (e  g., 
Furnas,  landauer,  Gomez,  and  Dumais,  198.^).  The  database  system  Rabbit  (Williams,  1984)  and 
its  "retrieval  by  elaboration"  paradigm  embodied  claims  about  the  structure  of  human  memory 
and  memory  search  as  consisting  in  the  manipulation  of  concrete  exemplars.  The  variety  of 
"Minimalist"  training  materials  and  software  environments  de.scribed  in  Carroll  ( 1 988b)  embody  a 
set  of  claims  about  how  new  users  learn  computer  applications.  The  display  management  system 
Rooms  (Card  and  Henderson,  1987)  embodies  an  analysis  of  typical  user  working  sets  (services 
and  data  accessed  simultaneously). 

User  interface  metaphors  are  a  systematic  and  detailed  intrusion  of  psychology  into  modem 
computing  system  development  (Carroll  and  Thomas,  1982;  Carroll.  Mack  and  Kellogg,  1988). 
For  example,  systems  that  provide  electronic  workspaces  that  can  be  written  to  and  viewed  by 
multiple  u.sers  in  a  cooperative  interaction  session  arc  pre.scnied  as  "chalkboard"  systems  in  the 
way  that  they  are  described  to  users  and  even  in  the  way  they  appear  and  operate  (Stefik,  Foster, 
Bobrow,  Kahn,  Fanning  and  .Suchman,  1987),  Thinking  of  the  system  as  a  physical  chalkboard 
provides  an  initial  familiarity  for  the  user.  It  also  suggests  specific  tasks  and  approaches  to 
accompli.shing  them.  It  provides  the  user  with  an  initial  conceptual  vocabulary  within  which  to 
couch  questions  and  draw  conclusions.  (Analogous  points  could  be  made  for  other  new 
computer  interface  de.sign.s  ranging  from  task  oriented  window  layout  (r.arrolI,  Herder  and 
Sawtelle,  1987),  to  object  oriented  programming  (Rosson  and  Alpert,  1988)). 

Many  recent  structure-directed  editors  and  intelligent  tutoring  systems  for  programming  are 
clearly  vehicles  for  instantiating  psycholopcal  analyses  of  programming  tasks  and  learning.  For 
example,  analyses  of  programming  plans  (e.g..  Soloway  and  Fhrlich,  1984)  arc  embodied  in  the 
Bridge  tutor  (Bonar  and  Fiffick,  1987).  Analyses  of  how  .students  learn  to  program  in  Fi.sp 
(Anderson,  Farrell  and  and  Sauers,  1984)  have  been  embodied  in  a  variety  of  intelligent  tutoring 
systems  for  teaching  Fi.sp  (Anderson  and  Skwarccki,  1986;  Rei.ser  et  al.,  1988).  Indeed,  Anderson 
has  recently  (1987)  argued  that  designing  and  evaluating  computer  tutors  provides  unique 
advantages  to  basic,  academic  psychological  research  into  the  mental  procedures  and  knowledge 
that  comprise  human  cognition. 

Of  course,  psychologists  per  .re  are  not  always  the  inventors,  but  psychological  rationale 
routinely  plays  a  determining  role  in  the  invention  of  new  software  technology.  In  this  work. 
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fICI  transcends  merely  serving  as  an  arena  for  applying  empirical  experience  and  theoretical 
analysis  to  invention.  A  better  description  is  that  a  two-way  relationship  has  developed  in  which 
HCI  artifacts  themselves  are  treated  as  media  for  codifying  experience  and  analysis,  in  which  HCl 
theories  are  'applied  invention"  no  less  than  11(31  artifacts  are  'applied  theory"  (Carroll  and 
CampbeU,  1988).  For  example,  the  theoretical  development  of  the  concept  "direct  manipulation" 
(Shneiderman,  1983)  devolved  from  a  collection  of  specific  11(31  inventions.  But  this  constitutes  a 
radical  shift  in  the  underlying  ontology  of  llCl,  namely,  seeing  computer  artifacts  like  interface 
metaphors,  menu  hierarchies,  programming  paradigms  and  languages,  tutors,  and  the  like  as 
playing  theory-like  roles. 

One  standaird  role  of  theories  is  to  codify  empirically  falsifiable  claims  (Popper,  1965). 
Artifacts  embody  testable  claims  about  how  u.sers  can  understand  and  make  use  of  system 
function  in  a  medium  that  makes  appropriate  empirical  investigations  possible.  F.ach  command 
name,  each  icon,  each  menu  makes  claims  about  the  ways  users  think  about  the  tasks  they  will 
undertake  with  these  systems. 

'Fhese  claims  are  mutually  interrelated,  creating  a  sort  of  web  of  theory  more  intricate  and 
more  comprehensive  than  any  analysis  deducible  from  conventional  di.scursive  psychological 
theory.  A  piece  of  software.  like  the  l-'nix  operating  sy.stcm.  makes  a  huge  number  of  specific 
claims  about  what  command  names,  operations,  and  so  forth  will  be  convenient  for  users.  These 
claims  can  be  wrong  (see  Norman,  1984).  Desktop  interfaces  make  myriad  claims  about  familiar 
presentation  and  natural  conceptual  vocabularies,  about  clipboards,  stationery  pads,  folders,  waste 
baskets  --  about  how  fhese  objects  behave  and  interact.  Moreover,  the  leading  claims,  for 
example  as  integrated  within  a  metaphor  like  the  desktop,  have  myriad  specific  dependencies  on  a 
diverse  set  of  ancillary  claims  (for  example,  claims  inherent  in  the  presentation  of  highlighting, 
preferences,  and  scrolling  elevators). 

Empirical  theories  provide  explanations  by  placing  logical  and  causal  constraints  on 
phenomena.  Artifacts  support  explanations  of  the  form  "this  specific  feature  has  this  specific 
usability  consequence."  The  "Tear  OfT  command  in  the  early  Lisa  de.sktop  system  provides  an 
example.  In  this  system,  "Fear  Off"  spawns  a  new  instance  from  a  prototype  object:  Tear  Off 
stationery  applied  to  a  stationery  pad  creates  a  piece  of  stationery.  Fhe  command  was  a  menu 
selection,  not  a  gesture  (Move  is  an  example  of  a  gestural  command:  one  selects  with  the  pointer 
and  then  moves  by  moving  the  pointer).  Thus,  there  was  a  sort  of  inconsistency  between  Mr  ve 
and  Tear  Off.  Some  users  initially  tried  to  Tear  Off  by  selecting  and  then  rapidly  sweeping  tlie 
pointer  (making  a  tearing  gesture).  This  error  has  little  consequence,  and  proved  relatively  ea.'. 
for  u,sers  to  sort  out  on  their  own.  A  more  difficult  problem  stemmed  from  the  fact  that  Tear  Ol’" 
also  applied  to  non-pad  objects  like  folders:  the  user  needed  to  Tear  Off  from  a  "folder  pad"  to 
get  a  new  folder  (Carroll  and  Mazur.  1986). 

Theories  also  contribute  to  the  development  of  science  by  providing  useful  foundations  for 
further  theorizing.  Artifacts  facilitate  theoretical  development  in  the  sense  that  given  artifacts 
make  task  analyses  possible  that  in  turn  facilitate  the  invention  and  development  of  new  artifacts. 
The  typewriter  metaphor  was  a  critical  step  in  the  development  of  the  desktop  metaphor,  which 
in  turn  has  been  critical  in  the  development  of  newer  interface  metaphors  such  as  rooms  and  task 
maps.  Understanding  user  problems  at  this  level  of  qualitative  detail  can  be  of  immediate  use  in 
the  design- of  new  software  artifacts.  Indeed,  in  subsequent  desktop  interface  products  the  Tear 
(3ff  command  evolved  into  a  Make  New  Folder  command. 

Theories  enable  and  compel  greater  explicitness  in  empirical  claims.  This  is  part  of  the 
traditional  motivation  to  formalize.  Artifacts  serve  this  role  in  a  manner  quite  analogous  to 
classical  views  of  simulation  (Fodor,  1968;  Newell  and  Simon,  1972).  To  paraphrase  Newell  and 
Simon,  both  must  "perform"  the  claims  they  incorporate:  the  implementation  details  mu.st  be 
made  explicit,  which  can  lead  to  further  learning  about  the  nature  of  the  claims  being  made. 
Simulations,  however,  arc  used  by  psychologists,  for  specific  research  purposes;  artifacts  arc  used 
by  a  wide  range  of  people  to  do  real  work.  Simulations  are  interpreted  and  evaluated  by  criteria 
of  descriptive  adequacy  (Chomsky,  1965):  a  simulation  of  problem-solving  behavior  may  be 
judged  on  the  basis  of  how  closely  it  fits  the  sequence  of  moves  in  a  verbal  protocol,  whether  it 
predicts  all  and  only  the  kinds  of  errors  that  arc  observed,  etc.  Artifacts  are  interpreted  and 
evaluated  by  criteria  of  usability. 
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Simulations  are  usually  seen  as  convenient  vehicles  for  theories,  but  not  as  necesmry.  Are 
artifacts  merely  convenient  expressions  of  Iff'l  theories,  or  do  they  play  a  more  fundamental  role? 
This  question  cannot  be  answered  now,  but  it  seems  likely  that  artifacts  are  in  principle 
irreducible  to  a  more  conventional  theory  medium.  The  reason  for  this,  if  it  is  so,  would  be  the 
unbounded  interrelation  of  the  many  claims  inherent  in  a  computer  artifact,  the  fact  that 
everything  in  software  seems  to  impact  everything  else  (Brooks,  1987),  the  fact  that  details  of 
context  and  situation  critically  impinge  upon  the  usability  of  systems  (Whiteside  and  Wixon, 
1987;  Winograd  and  Flores,  198fi;  Suchman.  1987).  All  these  may  be  views  of  the  same 
underlying  state  of  affairs;  the  design  of  software  may  be  of  an  order  of  complexity  beyond  that 
which  conventional  theories  can  explain  or  predict  (Ilayek,  1967). 

In  the  introduction,  we  considered  the  apparent  paradox  that  product  innovations  in  user 
interface  design  often  lead  HCI  re.search  rather  than  following  from  it  in  the  conventionally 
assumed  flow  of  "technology  transfer"  from  Research  to  Development.  However,  the  view  of 
HCI  in  which  its  artifacts  play  theory-like  roles  in  organizing  research  defuses  the  perplexity  of 
this  state  of  affairs.  Empirical  research  often  follows  the  explicit  codification  of  theories.  In  HCI 
the  medium  of  choice  for  expressing  theories  of  usability  is  in  many  cases  an  exemplary  artifact. 
The  appearance  of  such  an  artifact  predictably  stimulates  empirical  research. 

3.2  Ecological  analysis 

The  paradigm  of  usability-innervated  invention  has  many  consequences  for  the  traditional 
empirical  roles  of  psychologists  working  in  HCI  domains.  There  are  consequences  both  for  what 
kinds  of  situations  are  studied  and  for  what  kinds  of  information  are  sought  in  empirical  studies. 
In  both  areas,  the  driving  considerations  devolve  from  invention.  The  model  of  research  practice 
in  experimental  psychology,  originally  adapted  to  HCI  through  human  factors  evaluation,  has 
been  augmented  by  the  requirement  that  empirical  work  bear  more  directly  on  the  invention  and 
development  of  new  artifacts.  In  this  sense,  current  work  is  shifting  toward  greater  responsiveness 
to  the  ecology  of  HCI  as  an  ecology  of  invention,  design  and  development. 

Ecologically  responsive  empirical  analysis  of  HCI  domains  takes  place  in  vivo\  in  software 
shops,  more  often  than  in  psychological  laboratories.  It  addresses  whole  problems,  whole 
situations,  when  they  are  still  technologically  current,  when  their  resolution  can  still  constructively 
impact  the  direction  of  technological  evolution.  Its  principal  goal  is  the  discovery  of  design 
requirements,  not  the  verification  of  hypothesized  direct  empirical  contrasts  or  cognitive 
descriptions.  A  recent  example  is  ('urtis,  Krasner  and  Iscoc's  (1988)  study  of  the  software  design 
process.  The  detailed  interviewing  of  real  designers  produced  specific  technical  proposals  for 
improving  software  tools  and  the  coordination  of  project  management,  an  assessment  of  major 
bottlenecks,  and  a  new  framework  for  thinking  about  .software  design  as  a  learning  and 
communication  process.  (See  Nielsen,  Mack.  Bergendorff  and  Grischkowsky,  1986,  and  Rosson, 
Maass  and  Kellogg,  1 988,  for  similar  kinds  of  studies.) 

Carroll  and  Campbell  (1988)  characterized  HCI  invention  in  terms  of  the  "task -artifact 
cycle";  a  given  understanding  of  the  tasks  programmers  need  to  and  want  to  accomplish  helps  to 
define  objectives  for  new  software  artifacts  (languages,  environments  and  education,  etc.)  to 
support  them  in  these  tasks.  Any  artifact  fundamentally  alters  the  tasks  for  which  it  was  designed, 
raising  the  need"  for  further  ta.sk  analysis,  and  in  time  for  the  design  of  further  artifacts,  and  so  on. 
An  example  is-the  progression  from  u.ser  interfaces  based  on  the  typewriter  metaphor  to  those 
based  on  the  desktop.  Early  word  proccs.sing  applications  were  designed  to  exploit  specific 
knowledge  their  users  already  had  about  typewriting,  function  keys,  data  display,  command  names 
and  so  forth  (Carroll  and  Thomas,  1982). 

The  typewriter  metaphor,  however,  altered  office  tasks  and  in  doing  so  helped  to  open  up 
technological  possibilities  by  preparing  users  for  further  electronic  office  applications  (calculators, 
calendars,  mail,  database).  TTiis  evolution  in  office  task  expectations  and  understandings  was 
better  addressed  by  systems  employing  the  desktop  metaphor.  However,  desktop  systems  also 
presented  a  variety  of  specific  problems  and  possibilities  to  users  (Carroll  and  Mazur,  1986; 
Whiteside  et  al.,  1985).  This  further  task  analysis  has  again  helped  to  define  further  interface 
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artifacts,  new  metaphors  for  display  organization  in  user  interfaces  ("rooms,"  Card  and  Henderson, 
1987;  "task  paths,"  Carroll,  Herder  and  Sawtelle,  1987). 

To  constructively  operate  within  the  task-artifact  cycle,  HCl  empirical  work  must  provide 
rich  analyses  of  real  users  working  on  real  tasks.  The  main  research  setting  for  such  ecological 
analysis  is  the  case  study.  A  case  study  can  begin  and  end  anywhere  in  the  task-artifact  cycle;  the 
key  requirement  is  access  to  real  situations.  Case  study  task  analysis  usually  consists  of  the 
collection  of  detailed,  qualitative  information  (thinking  aloud  protocols,  interviews).  Such  data 
are  arbitrarily  rich;  they  can  be  returned  to  over  and  over  again,  and  analyzed  from  many 
different  perspectives.  A  typical  approach  is  to  make  videotapes  to  create  a  vivid  and  permanent 
data  library.  The  development  of  Minimalist  training  materials  and  software  environments,  cited 
earlier,  was  based  on  such  case  study  analysis  (Carroll,  1988b).  Mack's  (1987)  inventory  of  new 
user  expectations  about  cause  and  effect  relationships  in  the  operation  of  a  word  processor  was  a 
case  study  analysis  culminating  in  the  development  of  a  prototype  that  more  intuitively  presented 
word  processing  function. 

It  is  important  to  collect  information  over  a  significant  span  of  time  to  eliminate  ephemeral 
effects.  Monitoring  patterns  of  actual  use  of  a  software  environment  often  supplements  the  more 
direct  interview  and  protocol  techniques.  Wixon,  Whiteside,  Good  and  .lones  (I98.t)  analyzed 
patterns  of  spontaneous  interaction  with  an  electronic  mail  application  to  determine  how  to 
design  a  more  usable  command  interface  for  the  application.  Kelley  (1984)  analyzed  the  desk 
calendars  of  office  workers  to  determine  requirements  for  an  electronic  calendar  facility.  Gould 
and  Boies  and  their  collaborators  have  designed  a  scries  of  voice  messaging  systems  using  this 
approach  (Gould  and  Boies,  1983;  Gould,  Boies,  levy,  Richards  and  Schoonard,  1987). 

The  key  goal  of  ecological  task  analysis  in  the  task-artifact  cycle  is  to  produce  requirements 
for  subsequent  design  work.  This  places  emphasis  on  identifying  big  factors  --  big  needs,  big 
usability  problems.  Thus,  one  typical  output  of  this  phase  is  an  error  taxonomy,  a  qualitative 
description  of  what  is  giving  the  user  trouble,  how  it  is  happening,  what  users  are  doing  in 
consequence,  etc.  The  complexity  and  rapid  evolution  of  software  technology  requires  richer  and 
more  open-ended  methods  than  the  direct  contrast  testing  of  the  human  factors  evaluation  and 
cognitive  description  approaches.  This  richer  style  of  task  analysis  is  interpretive,  inductive;  it 
seeks  to  discover,  not  merely  to  confirm  or  disconfirm. 

It  often  requires  studying  user  interface  technologies  and  applications  before  they  are  even 
developed;  after  all,  that's  the  point  at  which  empirical  guidance  can  be  most  effectively  directive 
(Carroll  and  ('ampbell,  1986).  For  obvious  reasons,  it  is  difficult  to  do  such  work,  but  a  variety 
of  simulation  techniques  have  been  developed.  For  example,  Gould,  Conti  and  Hovanyecz 
(1983)  simulated  a  speech  recognition  capability  to  explore  technological  tradeoffs  in  a  technology 
that  was  not  then  available.  Carroll  and  Aaronson  (1988)  analyzed  interactions  with  a  simulated 
intelligent  help  facility  to  help  direct  the  development  of  more  usable  artificial  intelligence 
applications. 

To  help  direct  the  task-artifact  cycle,  new  types  of  usability  data  and  new  roles  for  u.sability 
data  are  being  developed.  For  example,  since  the  ideas  that  lead  HCI  research  typically  become 
codified  in  products  first,  it  is  important  to  be  able  to  interpret  running  systems,  to  extract  key 
ideas  and  work  with  them.  Norman  (1984)  made  an  influential  psychological  interpretation  of 
key  aspects  of  the  Unix  operating  system.  Carroll  and  Mazur  (1986)  analyzed  new  user 
expectations  and  experiences  using  the  on-line  tutorial  and  direct  manipulation  interface  of  the 
Lisa  system.  Rosson  and  Alpert  (1988)  have  recently  analyzed  psychological  implications  of 
objected  oriented  design.  Carroll,  Mack  and  Kellogg  (1988)  outlined  tools  for  analyzing  user 
interface  metaphors  in  design. 

Another  focus  for  the  development  of  tools  for  empirical  analysis  is  the  process  of  software 
and  system  development.  A  comprehensive  methodology  of  goal  definition  and  measurement  has 
been  developed  for  guiding  the  discovery  of  appropriate  usability  requirements  and  evaluating 
progre.ss  toward  meeting  these  requirements  within  the  design  process  (Bennett,  1984;  Carroll  and 
Ros,son,  1985;  Whiteside,  Bennett  and  Hotzblatt,  1988). 

Usability-innervated  invention  offers  a  more  directive  role  in  framing  new  applications  and 
user  interfaces,  and  a  more  ecologically  responsive  role  for  empirical  work.  It  incorporates  and 
builds  upon  the  prior  orientations  of  human  factors  evaluation  and  cognitive  description,  but 
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pushes  onward  in  taking  more  seriously  the  fact  that  MCI  is  a  design  field,  that  it  exists  to  invent 
more  usable  systems  and  software.  Farlier  approaches  to  psychology  in  MCI  had  in  effect  isolated 
the  task  analysis  part  of  the  task-artifact  cycle  from  the  definition,  development  and  first  use  of 
new  software  and  user  interface  technology,  because  of  preconceptions  about  the  kinds  of 
contributions  psychologists  might  make  to  MCI.  As  a  result,  and  in  addition  to  a  variety  of 
specific  limitations  discussed  above,  they  offered  only  commentary  on  the  process  and  products  of 
design,  not  participation. 

4.  The  Ecology  of  Computing 

The  progression  of  three  paradigms  in  the  recent  history  of  MCI  comprises  a  case  study  of  a 
field  discovering  what  it  is  about.  lIC.'l  has  achieved  much  by  exploiting  the  context  of  its  own 
practice.  It  has  as,similated  the  evaluation  methodology  of  experimental  psychology,  the  theory  of 
cognitive  science,  and  the  invention  and  development  of  new  technology.  Fach  step  in  this 
evolution  has  solved  some  of  problems  posed  by  the  step  preceding  it. 

The  emerging  paradigm  of  u.sabilify-inncrvated  invention  redresses  the  ecological  limitations 
of  direct  contrast  laboratory  evaluations  by  promoting  new  methods  and  new  roles  for  empirical 
evaluation.  It  redresses  the  theoretical  limitations  of  design  by  deduction  by  countenancing  richer 
sources  and  embodiments  of  scientific  theory.  This  in  turn  has  resolved  other  puzzles  about  MCI. 
For  example,  the  primacy  of  product  development  ideas  in  MCI  re.search  is  puzzling  only  until  it 
is  recognized  that  product  development  is  a  major  context  for  MCI  research:  one  of  the  important 
roles  of  psychology  in  IKM  is  to  provide  interpretation  and  conceptual  clarification  for  product 
innovations. 

Fven  the  mysterious  race  between  function  and  usability  dissolves:  appropriately 
contextualized  MCI  research  cannot  lag  the  technological  leading  edge;  it  lives  at  the  technological 
leading  edge;  indeed,  it  creates  the  technological  leading  edge.  For  example,  there  is  no  race 
between  usability  and  function  in  the  development  of  the  Rooms  display  management  system 
(Card  and  Henderson,  1987),  even  though  the  Rooms  approach  is  at  the  edge  of  our  current 
understanding  of  display  management  tasks  and  artifacts.  The  race  between  function  and  usability 
is  simply  an  untoward  sidc-efTcct  of  the  organizational  consequences  of  human  factors  evaluation 
and  cognitive  description. 

Usability-innervated  invention  offers  a  new  basis  for  these  organizational  dynamics.  When 
the  basis  for  collaboration  is  evaluative  or  descriptive  commentary  offered  from  outside  the  design 
team,  the  grounds  are  frequently  political,  and  power-based,  or  interpreted  as  political  and 
power-based.  This  is  completely  unconstructive:  it  pushes  empirical  evaluation  and  psychological 
theory  further  away  from  invention.  Operating  within  the  task-artifact  cycle  as  task  analysts,  as 
inventors  of  artifacts,  offers  a  deeper  source  of  interdisciplinary  and  inter-organizational 
coordination:  shared  understanding  of  what  the  problems  are,  why  the  current  design  situation  is 
what  it  is,  what  the  immediate  and  longer-term  options  are,  and  how  they  trade  off.  It  offers  the 
alternative  of  committed,  cooperative  work. 

4.1  Science  and  invention 

There  is  a  conventional  view  of  the  relationship  between  scientific  research  and  the 
invention,  desi^  and  development  of  practical  artifacts.  The  idea  is  that  basic  science  provides  an 
understanding  of  nature  which  can  then  be  applied  deductively  in  practical  contexts.  The 
relationship  between  science  and  invention  in  IK,T,  as  it  has  emerged  through  the  course  of  the 
last  15  years,  is  interesting  from  this  standpoint  in  that  appears  to  be  culminating  (at  least  to  this 
point  in  time)  .somewhat  unconventionally. 

To  be  sure,  the  conventional  view  was  what  the  field  started  out  with:  the  vision  of  the 
human  factors  evaluation  and  cognitive  dc,scription  paradigms  was  to  develop  an  empirical  basis, 
to  develop  a  theoretical  framework  and  finally  to  apply  the  theory  deductively  in  design.  Through 
hard  experience.  MCI  discovered  that  things  were  not  this  neat.  Invention  produces  theory  in 
MCI  at  least  as  much  as  it  applies  theory  and  this  has  fundamentally  altered  the  nature  of  the 
empirical  work.  The  resolution  of  this  may  lie  in  a  countercurrent  in  the  history  of  science, 
questioning  the  conventional  view  itself.  For  example,  Ilindle  (1981)  analyzed  a  variety  of  1 9th 
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century  inventions  and  failed  to  find  any  deductive  grounding  in  the  basic  science  of  the  time. 
Ilindle  suggests  that  the  conventional  view  may  have  developed  as  recently  as  the  1850s  in  the 
American  scientific  establishment  as  a  tactic  for  increasing  the  prestige  of  and  federal  support  for 
basic  research. 

Many  well-known  instances  of  invention  clearly  do  not  conform  to  the  conventional  view. 
The  pulley,  for  example,  had  been  used  effectively  for  some  2.000  years  before  an  adequate 
scientific  analysis  of  its  operation  was  developed  within  Newtonian  mechanics.  The  violins  of  the 
17th  century  were  so  finely  crafted  that  their  design  was  merely  emulated  for  over  200  years. 
Indeed,  only  in  the  last  couple  of  decades  has  there  been  any  appreciable  acoustic  understanding 
of  how  violins  really  work  (Hutchins,  1962).  And  it  is  not  clear  yet  whether  the  .science  of 
acoustics  itself  was  more  a  contributor  to  or  a  beneficiary  of  this  work. 

Of  course,  there  «  a  relation  between  basic  science  and  invention,  but  not  a  simple 
deductive  relation.  Gomory  (198.1)  puts  the  point  well  when  he  argues  that  the  development  of 
technology  is  both  more  complex  and  less  predictable  than  the  basic  research  from  which  it  is 
seen  to  spring.  Gomory  discusses  the  first  150  years  of  technology  development  for  the  steam 
engine.  He  shows  that  the  "revolutionary"  engines  of  the  mid -nineteenth  century  actually  evolved 
through  many  small  steps,  each  retying  on  the  chance  availability  of  a  technological  niche,  an 
application  in  which  the  technology  could  survive  and  develop.  The  case  study  of  MCI  suggests 
that  the  relation  between  basic  science  and  invention  can  be  highly  interactive  and  reciprocal. 
The  conventional  view  goes  wrong  in  trying  to  frame  this  relation  too  narrowly. 

It  is  a  commonplace  of  the  philosophy  of  science  since  positivism  to  observe  that  there  are 
no  "discovery  procedures,"  no  algorithms  to  carry  us  from  the  raw  material  of  empirical  science  to 
a  theoretical  explanation  of  that  raw  material.  A  way  to  put  this  point  is  to  say  analogously  that 
there  are  no  'invention  procedures":  the  logical  leap  from  basic  data  and  theory  to  the  invention 
and  development  of  a  usable  artifact  is  neither  more  or  less  deterministic  than  the  step  we  are 
more  familiar  with,  namely  the  step  from  the  raw  material  of  experience  to  a  theory  of  a 
conventional  sort.  Ihe  applied  science  of  the  conventional  view  is  a  myth. 

Psychology  is  a  young  science,  so  is  ("omputer  Science,  so  is  Cognitive  Science,  and  above 
all,  so  is  HCI.  But  this  raises  the  question  of  whether  the  complex  and  reciprocal  interaction  of 
.science  and  invention  in  HCI  is  attributable  just  to  the  youth  of  the  relevant  fields,  to  scientific 
growing  pains  as  it  were.  In  view  of  this  possibility  it  is  relevant  to  consider  the  acoustic  analysis 
of  the  violin  as  conducted  over  the  past  40  years  by  members  of  the  Catgut  Society,  an 
interdisciplinary  group  of  musicians,  instrument  craftsmen,  physicists  and  engineers.  Carla  Maley 
Hutchins,  the  senior  member  of  this  team,  told  me  an  interesting  anecdote  about  an  early  stage  in 
her  collaboration  with  Bell  I  abs  physicists.  The  physicists'  initial  approach  was  to  disassemble  a 
violin,  induce  sine  waves  and  measure  resulting  resonances. 

It's  a  beautiful  image;  it  recalls  the  direct  contrasts  of  human  factors  evaluation  and  the 
shallow  theories  of  cognitive  description.  It  recalls  models  of  error-free  user  behavior  as  bases  for 
understanding  how  to  design  usable  computer  systems  and  applications.  It  is  the  conventional 
strategy  of  divide  and  conquer,  which  too  often  requires  subtracting  out  the  essence  of  the 
problem  being  solved.  Inducing  pure  sine  waves  into  the  pieces  of  the  violin  to  measure  the 
resonances  is  not  an  adequate  approach  to  understanding  the  violin.  The  sound  to  which  a  real 
violin  responds  is  not  a  pure  sine  wave  and  it  is  not  induced;  it  is  a  complex  tone  produced  by 
bowing.  Moreover,  the  resonances  in  a  whole  violin  derive  both  from  the  parts  and  from  the 
composition  of  the  parts,  indeed  from  the  big  chunk  of  air  trapped  within  the  composition  of  the 
parts.  Analyzing  the  parts,  does  not  add  up  to  an  understanding  of  the  behavior  of  the  whole. 

The  point  is  not  that  these  idealized  acoustic  analyses  were  pointless.  Such  work  is 
on-going,  and  has  even  produced  techniques  useful  in  violin-making  (Hutchins,  1981).  And  the 
point  is  not  that  acoustic  science  has  nothing  to  offer  as  a  foundation  for  understanding  violins 
(bowing  does  not  produce  pure  sine  waves,  but  it  docs  produce  sound  after  all).  The  point  is  that 
even  in  physics  the  initial  approach  to  applying  science  to  design  is  often  simplified  and 
inadequate,  whereas  the  effective  role  is  more  interactive  and  reciprocal.  Indeed,  the  comparison 
can  pushed  much  further;  the  research  of  the  Catgut  Society  eventuated  in  the  design  and 
development  of  a  new  set  of  stringed  instruments,  the  Violin  Octet.  The  analysis  could  go  only 
so  far  when  its  purview  was  an  account  of  the  standard  string  quartet  (which  acoustically  is  a  very 
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accidental  collection  of  instruments).  To  develop  and  assess  laws  of  acoustic  scaling,  to  test  and 
develop  claims  about  the  violin,  it  was  necessary  to  build  novel  instruments  (Hutchins,  1967; 
Hutchins  and  Schelleng,  1967). 

The  violin  is  intrinsically  a  very  appealing  example.  But  one  needn't  go  so  far.  Anyone  in 
the  New  York  area  recalls  the  renovation  of  Carnegie  Hall.  T  here  was  much  concern  and  much 
debate  about  the  impact  this  would  have  on  the  famous  acoustics  of  that  hall.  Acoustics,  the  old 
science  of  physics,  could  not  deductively  direct  or  predict  the  outcome.  Indeed,  to  this  day  the 
only  fact  that  everyone  agrees  on  is  that  the  acoustics  of  C'amegie  Hall  are  now  different. 

4.2  The  current  perplexity 

Failure  to  appreciate  the  subtleties  of  technology  development,  coupled  with  the  inherent 
limitations  of  the  human  factors  evaluation  and  cognitive  description  paradigms  of  HCl  and  the 
emergence  of  the  usability-innervated  invention  paradigm,  has  caused  substantial  perplexity  in  the 
field.  One  body  of  work  has  responded  to  Newell  and  Card's  (198.S)  worry  that  psychology  must 
be  scientifically  hard  to  survive  in  HCI  by  retreating  into  the  study  of  low-level  phenomena  and 
of  highly  constrained  situations,  creating  a  very  insular  research  microcosm.  One  of  the  key  areas 
of  its  focus  is  replicating  cla.ssic  phenomena  from  the  psychology  of  nonsense  list  learning  (e.g.. 
Poison,  Kieras  and  Muncher,  1987).  This  approach  flaunts  all  the  limitations  of  the  cognitive 
description  paradigm.  It  is  not  at  all  clear  that  it  can  be  relevant  to  HCI  design  work. 

Another  body  of  work  has  rejected  psychology  as  a  totally  inappropriate  foundation  for 
design  work  in  HCI  (Whiteside  and  Wixon,  1987;  Winograd  and  Flores,  1986).  In  this  view, 
focussing  on  models  of  the  mind  and  conceiving  of  people  as  computational  devices  that  process 
inputs,  generate  goal  lists,  and  then  execute  plans  and  responses  all  merely  obscure  and  obstruct 
the  designer's  most  important  responsibility  and  objective:  to  understand  the  user's  needs  and 
wishes  and  to  serve  the  user.  This  work  Haunts  the  theoretical  limitations  of  human  factors 
evaluation,  looking  to  hermeneutics  as  a  conceptual  foundation  for  design  and  emphasizing 
interpretations  that  are  unique  to  the  situation  and  to  the  individual  doing  the  interpreting,  and 
explicitly  discouraging  model -building  or  any  form  of  abstraction.  However,  since  it  is  bound  to 
particular  cases,  this  work  cannot  provide  any  framework  for  understanding  HCI  phenomena  as 
types. 

Both  approaches  are  dismal  in  prospect;  one  offering  no  hope  of  practical  impact  and  the 
other  no  hope  of  understanding.  However,  from  the  standpoint  of  the  present  di.scussion  the.se 
extreme  positions  have  despaired  too  quickly.  An  orderly  evolution  of  HCI  work  has  produced  a 
paradigm  that  builds  upon  the  genuine  contributions  of  human  factors  evaluation  and  cognitive 
description  and  at  the  same  time  redresses  their  limitations  with  respect  to  design  impact  and  the 
ecological  validity  of  empirical  work. 

HCI  has  often  been  described  as  an  "interdisciplinary"  research  area,  but  only  now  are  the 
full  interdisciplinary  possibilities  emerging.  Participating  fully  and  in  a  variety  of  roles  in  the 
evolution  of  computer  technology  offers  psychologi.sts  in  HCI  a  uniquely  creative  opportunity. 
It's  a  demanding  opportunity.  Inventing  the  future  is  more  difficult  than  commenting  on  it. 
Pushing  psychological  theory  to  interpret  and  analyze  new  technological  situations  and 
embodying  psychological  claims  and  results  in  IK'I  artifacts  is  not  easier  than  evaluating  finished 
systems,  computing  t-tests  and  calculating  performance  times.  But  then  one  does  not  move  to 
the  frontier  for  The  comforts  of  familiarity.  The  po.ssibility  and  the  challenge  of  HCI  today  is  to 
move  forward  to  new  roles  and  new  ideas  in  technology  and  science. 
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